Analysis of the Relationships among Longest Common Subsequences, Shortest Common Supersequences and Patterns and its application on Pattern Discovery in Biological Sequences

نویسندگان

  • Kang Ning
  • Hoong Kee Ng
  • Hon Wai Leong
چکیده

For a set of multiple sequences, their patterns, Longest Common Subsequences (LCS) and Shortest Common Supersequences (SCS) represent different aspects of these sequences' profile. Revealing the relationship between the patterns and LCS/SCS might provide us with a deeper view of the patterns. In this paper, we have showed that patterns LCS and SCS were closely related to each other. Based on their relations, the PALS algorithms are proposed to discover patterns in a set of biological sequences based on LCS and SCS results. Experiments show that the PALS algorithms are superior in efficiency and accuracy on a variety of sequences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

Problems Related to Subsequences and Supersequences

We present an algorithm for building the automaton that searches for all non-overlapping occurrences of each subsequence from the set of subsequences. Further, we define Directed Acyclic Supersequence Graph and use it to solve the generalized Shortest Common Supersequence problem, the Longest Common Non-Supersequence problem, and the Longest Consistent Supersequence problem.

متن کامل

Common Subsequences and Supersequences and Their expected Length

Let f(n; k; l) be the expected length of a longest common subse-quence of l sequences of length n over an alphabet of size k. It is known that there are constants (l) k such that f(n; k; l) ! (l) k n, we show that (l) k = (k 1=l?1). Bounds for the corresponding constants for the expected length of a shortest common supersequence are also presented.

متن کامل

DISCOVERY of LONGEST INCREASING SUBSEQUENCES and its VARIANTS using DNA OPERATIONS

The Longest Increasing Subsequence (LIS) and Common Longest Increasing Subsequence (CLIS) have their importance in many data mining applications. We propose algorithms to discover LIS and CLIS from varied databases. This work finds all increasing subsequences from the given database, find increasing subsequences in n sliding window, longest increasing sequences in one and more sequences, decrea...

متن کامل

On the Approximation of Shortest Common Supersequences and Longest Common Subsequences

The problems of finding shortest common supersequences (SCS) and longest common subsequences (LCS) are two well-known NP-hard problems that have applications in many areas, including computational molecular biology, data compression, robot motion planning, and scheduling, text editing, etc. A lot of fruitless effort has been spent in searching for good approximation algorithms for these problem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • International journal of data mining and bioinformatics

دوره 5 6  شماره 

صفحات  -

تاریخ انتشار 2011